﻿ Contextual Conditional Models for Smartphone-based Human Mobility Prediction Trinh Minh Tri DoDaniel Gatica-Perez Idiap Research Institute, SwitzerlandIdiap Research Institute and Ecole do@idiap chPolytechnique F´ed´erale, Lausanne (EPFL), Switzerland gatica@idiap ch ABSTRACTcontext such as location and time In particular, we address Human behavior is often complex and context-dependent two fundamental tasks for mobility prediction: what is the This paper presents a general technique to exploit this “mul-next place a user will visit?; how long will he stay in the tidimensional” contextual variable for human mobility pre-current place? diction We use an ensemble method, in which we extractWe formulate the mobility prediction problem as one of learn- different mobility patterns with multiple models and thening the conditional distribution of the output variables given combine these models under a probabilistic framework Themultiple contextual input variables Instead of deﬁning a key idea lies in the assumption that human mobility can becomplex model that exploits all useful context sources, we explained by several mobility patterns that depend on a sub-develop a principled way to combine multiple mobility-related set of the contextual variables and these can be learned by apatterns For a speciﬁc task (next place prediction, duration simple model We showed how this idea can be applied toprediction), the framework focuses on ﬁnding relevant con- two speciﬁc online prediction tasks:what is the next placetextual features, and on building single models that capture a user will visit?andhow long will he stay in the currentspeciﬁc mobility patterns The set of single models is then place? Using smartphone data collected from 153 userscombined together under a probabilistic approach Another during 17 months, we show the potential of our method incontribution of this work relates to the integration of gen- predicting human mobility in real life eral and personalized prediction models, where we model the fact that, while human mobility is highly personal, user- Author Keywordsindependent patterns could be extracted from the data and prediction,user mobility,smartphone,mobile context used as prior knowledge about mobility when making pre- dictions We show how this approach signiﬁcantly improves INTRODUCTIONthe performance of a personalized predictive model To our Smartphones have become attractive option for sensing hu-knowledge, this is a novel approach to exploit population- man and social behavior As phones are usuallybased patterns for predicting mobility of speciﬁc persons kept in relatively close proximity and contain many use-We validate our approach using a large-scale data set involv- ful sensors that can record contextual and user activity cuesing 153 users in a 17-months data collection campaign This (location, application usage and calling behavior), they canlongitudinal data allows us to study the predictability of user be effectively used to capture and mine user behavior in ev-mobility from smartphone data collected in real-life condi- eryday life The availability of multiple data sourcestions, in which users’ locations are not always available For from smartphones also enables the possibility of predictingthe two prediction tasks, the proposed ensemble approach future user behavior, based on sensed past user activities is showed to be effective in exploiting multiple dependen- cies between user context and mobility, leading to improved In this paper, we study the prediction of user mobility fromperformance over single basic predictors The experimental smartphone data under a general predictive framework Weresults also showed that the predictability of user mobility propose new algorithms for exploiting multiple contextualdepends on many factors On one hand, there are situations variables using a probabilistic approach, that infers the con-in which the human mobility cannot be predicted reliably ditional dependencies between contextual input variables anddue to low number of observations of the same context or output variables that correspond to predictions For example,the complexity of human behavior (e g , the large number of our approach estimates the conditional probabilities over theoptions for lunch leads to low predictability of the restau- set of destinations that a user can go to, given his currentrant the user goes) On the other hand, repetitive mobility pattern can be learned and predicted reliably as long as the smartphone collected enough observations The paper is organized as follows The next section is ded- Permission to make digital or hard copies of all or part of this work foricated to related work, followed by a description of the raw personal or classroom use is granted without fee provided that copies are not made or distributed for proﬁt or commercial advantage and that copiesdata and place extraction process Then, we present the gen- bear this notice and the full citation on the ﬁrst page To copy otherwise, oreral formulation of prediction tasks and detail the models for republish, to post on servers or to redistribute to lists, requires prior speciﬁcthe two speciﬁc prediction tasks Finally we provide experi- permission and/or a fee mental results and present some concluding remarks UbiComp ’12, Sep 5-Sep 8, 2012, Pittsburgh, USA Copyright 2012 ACM 978-1-4503-1224-0/12/09 $10 00 RELATED WORKin order to preserve battery life This enables the phone to Predicting human mobility has been an increasingly relevantrecord data continuously with the only restriction of charg- topic in pervasive computing thanks to the improved abil-ing the phone once a day The location data comes from two ity to track people location While our work con-sources: GPS sensor and WiFi data The estimation location siders location data with state-of-the-art accuracy extractedWiFi APs location was integrated in the sensing software, from smartphones, it is also possible to get location data ofwhich looked for co-presence of APs and GPS data lower resolution using other sensing methods Song et al studied the predictability of human mobility from veryThe raw location data was ﬁrst transformed into a symbolic coarse location data of GSM tower IDs WiFi access pointsspace which captures most of the mobility information and are another popular data source for localizing and predictingexcludes actual geographic coordinates This was done by user mobility Recently, Vu et al proposed a frame-ﬁrst detecting visitedplacesand then mapping the sequence work to extract places from WiFi records, and predict humanof GPS coordinates into the sequence of visits of checked-in movement using joint WiFi and Bluetooth traces places (represented by a place ID) This procedure has two main advantages: i) the resulted high-level representation is The prediction of human mobility has also been consideredsimple for modeling and captures most important informa- in various settings, such as inferring a destination based ontion of user mobility, and ii) the model can predict user mo- partial paths , predicting the location and the stay du-bility without knowing the actual coordinates, thus avoiding ration at a given time in the future , or estimating thethe privacy issues related to location sharing probability that a person is present at a speciﬁc place at aPlace discovery The automatic extraction of places that given time All these example can be viewed as learn-people visit has been addressed in previous works with dif- ing an input-output function from past, available observa-ferent ways of deﬁning places We used a tions, where the input is the user context (partial path, cur-modiﬁed version of a recently proposed approach The rent location, time), and the output is the variable we wantraw trajectory data is ﬁrst used as an input to detectstay to predict (destination, arrival time) points, which are deﬁned by circular geographic regions in which users stay for a signiﬁcant amount of time (in our Previous work on human mobility prediction often consid-work, at least 20 minutes) All discovered stay points are ers multiple contextual data sources For the combination ofthen clustered into meaningful locations calledstay regions location and time, a simple, yet effective method is to buildusing a grid clustering algorithm This method divides the a model for each location separately More sophisti-space with a uniform grid where each cell is a square region cated methods typically require a carefully designed model,of side length equal toDmeters Then the algorithm ﬁnds such as the spatio-temporal decision tree for mining trajec-places of3×3cells by looking for the highest density cell tory pattern proposed in Recently, Cho et al pro-and sets any unassigned stay point of3×3region around posed a model that combine spatial, temporal, and social re-the cell to the new stay region (see Fig 1(a)) This process lation information for predicting human movement In theserepeats until all non-empty cells are assigned A drawback works, the combination method is task-speciﬁc and hard toof this method is that the discovered regions are not highly be extended to a large number of contextual variables As anoptimized to cover most stay points, see Fig 1(a), where one alternative, ensemble methods represent a promising direc-point stays out the place We improved the place discovery tion, as recently addressed in This work is, however,algorithm by adding an additional step of maximizing the limited to choosing a single best model rather than combin-number of covered stay points with respect to the set of re- ing the (typically available) prior information from multiplegions that cover the highest density unassigned cell Figure models The existing literature on mobility prediction points1(b)illustrates a simple example of our variant, where we out to a need for improved, principled ways to exploit and in-chose the best3×3region among 9 possible regions In tegrate the potentially large number of contextual variablesour implementation, we also increased the resolution of the available on mobile devices like smartphones grid and consider5×5-cells region instead of3×3-cells region Cell size was set toD2= 50×50meters and the RAW MOBILE DATA AND PLACE EXTRACTIONﬁnal dimension of stay regions is250×250meters, which is In this paper, we use data collected with Nokia N95 smart-large enough to handle WiFi-location noise occurred in some phones for a period of 17 months The data collection cam-places Compared to the original grid clustering algorithm, paign started from October 2009 in a European country Thereour variant generates less but more accurate stay regions by are 153 participants consisting mainly of students, profes-ﬁnding the best ﬁt of stay regions for the set of stay points sionals and few others (retired people, housewives) EachHigh-level representation Once the visitedplacesare de- participant carried the smartphone as their main and uniquetected, we can remap the sequence of raw coordinates into phone, and thus recorded data in real-world condition sequence of place IDs Note that not all coordinates belong The data collection framework was based on a server-clientto a place as some might correspond to motion of the user, architecture in which a client software was installed in theand the place ID is set to NULL for these transition points smartphone to record data and upload it automatically to aThe sequence of visits is then derived from the sequence of server via WiFi network The client was programmed toplace IDs, where a visit corresponds to a time interval with a record various data types (e g , GPS, WiFi APs, calling logs,single non-null place ID which last at least 20 minutes (The etc ) with dynamic sampling rates depending on the inferredduration of a visit is deﬁned as the time difference between state of the user (e g , indoor-staying, outdoor-moving, etc)the last point and ﬁrst point of the interval) For a given (a)Stay points and grid clustering(b)Modiﬁed grid clustering Figure 1 Place discovery from stay points The grid clustering can be viewed as the process of ﬁnding the set of square regions that cover all detected stay points Figure 3 Conditional models Conditional model We consider a probabilistic approach in which the context and the outcome are modeled as random variables Formally, we learn the distributionP(Y|X)whereX={X1,X2, ,XF} denotes the set ofFcontextual variables andYdenotes the Figure 2 The sequence of visits (colored by place ID) and the two pre-outcome variable For example, if both contextual variables diction tasks and outcome variable are discrete, then we can estimate the useru, the high-level representation of location data is thenconditional probability from observation as follows:p(Y= v(u,1), ,v(u,Nu)wherev(u,i)denote theithvisit for useru,y|X=x)=nx,y+αy (n+α), wherenx,yis the count of ob- x,y Nuis the total number of visits for that user, and each visit is characterized by the ID of the placev(u,i) id, the check-inserving the context-outcome pair(x,y)in the learning data, timev(u,i) time, and the durationv(u,i) duration andαis a (small) regularization factor To simplify the pre- sentation, we also use the notationp(y|x)to represent the CONDITIONAL MODEL FOR MOBILITY PREDICTIONconditional probabilityp(Y=y|X=x)hereafter In this paper, we consider the issue of human mobility pre- diction in an online setting, in which the user model updatesClearly, the more accurate the context is, the less uncertain its parameters after each visit to predict future visits In otherthe outcome might be In other words, the model is more ac- words, we simulate a smartphone personalized applicationcurate if we consider more relevant contextual variables For that learns user mobility patterns and make predictions onexample, the combination of current location and time could the ﬂy As the available data increases continuously (i e , thebe more effective than using location or time alone How- is large then we need more model knows more about user habits), we hypothesize thatever, if the contextual spaceXdata (Y|X) The main the prediction accuracy could be improved over time to estimate robustly the distributionP challenge then to build a model that can exploit efﬁciently Formulation of the prediction tasksthe available contextual variables from limited data samples The problem of predicting human behavior can be formu-An extreme case is to consider the full conditional model lated as ﬁnding a functionf:X→YwhereXis the(Fig3(a)), in which the outcome variableYdepends on contextual space andYis the behavioral output space Letall contextual variablesXfjointly This could be the ideal {x(u,i),y(u,i)}u=1 U,i=1 Nbe a training set ofUusers, inmodel if we had enough samples from the true distribution u which useruhasNucontext-output pairs In our online set-and the computational power was not a problem On the ting, to predict the outputy(u,i)of the contextx(u,i), we canextreme, one could consider a Naive Bayes model in which only use data{x(u,j),y(u,j)}jto learn the functionmodel characterizes dependencies betweenYand each con- =1 (i−1) separately However, the Naive Bayes f Note that the both the context spaceXand the outputtextual variableXf spaceYdepends on the task Also, to improve the predic-model is based on the (strong) hypothesis that contextual tive performance, in addition to location and time, we canvariables are independent given the outcome also enrich the context by considering some additional con-Ensemble method We propose to use an intermediate solu- textual information available on the phone, such as the den-tion by factorizing the distribution into multiple conditional sity of nearby Bluetooth devices, or the various phone ap-distributions which take subsets ofXas context as in Figure plication logs that might be indicative or the user’s states 3(b) FormallyK Pk(Y|Ck)wk In this setting, we investigate two predictive tasks which areP(Y|X)=k=1(1) illustrated in Figure2 Z(X) Task 1 (next place prediction):We want to predict whatwhereCk⊂Xdenotes a subset of contextual variables,wk could be the place ID of the next visit,v(u,i+1) id, givenis a weighting factor for the distributionPk, andZ(X)is a that we have knowledge ofv(u,i) normalization constant This is analogous to considering a conditional distribution Task 2 (duration prediction):We want to predict the staylogarithmic opinion pool ofKgiven byKmodels, each of which uses a speciﬁc subset of duration of theithvisit to a given the placev(u,i) idwith ancontextual variables and a distribution over the output space arrival timev(u,i) time Note that we do not restrict that the sets of contextual vari- ables to be unique Instead, we could use various models forNameDescriptionTask LOCID of the current place1&2 the same set of contextual variables to improve the predic-X1 HOURhour of the day 1&2 tive performance The weight vectorw=(w1, ,wk)isX2 3DOWday of the week (from Monday to Sunday) 1&2 a parameter of the combination, and there are several waysXXWEworkday/weekend indicator 1&2 to learn it from the given observations This should in ef-4X5FREQfrequency of visits to the current place It1&2 fect give more weight to models that are more accurate Letis discretized into 5 levels based on monthly xkbe the shortened context ofxwhich uses contextual vari-frequency These levels are separated by 4 ables inCk The model performs the following steps forvalues of 1, 4, 10 and 30 visits in a month 6DURThe average visit duration of the current1 predictingy(u,i):Xplace It is discretized into 4 levels, sepa- Step 1:For each modelk, update model parame-rated by 1 hour, 2 hours and 4 hours ters based on the mobility history of useru:X7BTNumber of nearby BT devices during the ﬁrst2 10 minutes of the visit It is discretized into 5 {x(u,j),y(u,j)}jlevels that are separated by 4 values 1, 2, 4, k=1 (i−1) Step 2:Compute conditional output distributions:and 8 nearby devices PCBinary variable which indicates if user call2 Pk(Y|Ck=x(u,i)) X8 kor sending SMS to someone during the ﬁrst Step 3:Compute output of Combined model :10 minutes of the visit Ky(wk (y(u,i))∗= argmaxypk|xu,i) Table 1 Contextual variables for the two prediction task Temporal k=1k variables(X2,X3,X4)are computed with leaving time for task 1, and In the above prediction process, there are two types of modelthey are computed with arrival time for task 2 parameters: the parameters of each individual conditional distributionPk()and the combination weight vectorw While conditional distributionsPk()is estimated from mobility his-flat(u,i)nxk,y+α (y|x)=,(2) tory of a single useru(i e , personalized model), the com-pkknx+α·|Y| bination weight can be shared between multiple users Ink i 1(u,j)(u,i) practice, we can learnwon a training dataset to obtain awherenxk,y=−j=11[xk=xk∧y=y]denotes user-independent solution forw The learning ofwdependsthe number of times the user went to placeygiven the cur- on the output space, and it will be detailed in the followingrent contextx, andn=i−11[x(u,j)=x]is the xkk j=1k sections k number of occurrences ofxk In the above formulation, the Up to now, we present our general predictive model In thefactorαadds a regularization effect towards the uniform dis- next sections, we will present in detail models for the twotribution, especially when the counts are small Note that basic prediction tasks, which correspond to two settings: dis-this effect becomes less important when the contextxkis crete output space and continuous output space popular (i e , the denominator is large), for which the esti- PREDICTING NEXT PLACEmation of conditional probability is more accurate For the prediction of next place, the outcome could be one of the region IDs that have been previously visited or a newThe ﬂat estimation of conditional probability in Eq 2con- place Formally, at timet, the output space is:siders each observation equally, regardless of time If the user changes his behavior over time (e g , changing their Y={v(i) id|v(i) time≤t}∪{NewPlace}home address or changing job), then the ﬂat estimation is whereNewPlacecorresponds to any previously non-visitedbiased strongly by the previous mobility patterns This can place Since the output space is discrete, we consider abe solved by introducing the time variable as part of the es- )be the index sequence of occur- multinomial distribution over the set of possible destinationstimation Let(q1, ,qnxk including the special category,NewPlace For this task, werences of the contextxkin decreasing order of timestamp build the context at visitibased on location and time, asTheweighted estimation, which focuses on recent observa- given in Table1 Although the previous location is also rel-tions computes the conditional probability as: evant for the prediction task, we did not used it since this in-n(u,q) x1[yj=y] k formation is usually unavailable due to missing data Whileweightedj=1j+α most of these variables involve the current visit only, thepk(y|xk)=nxk1,(3) +α|Y| two variablesX5andX6accumulate information from thej=1j past mobility to build contextual information These vari-1 wherejis the num- ables group places into various categories, depending on fre-where each observation is weighted byj (u,qj) quency and duration of visits (user behavior is expected tober of occurrences of the contextxksince the visitv be similar for places of the same category) While the placeIn our experiments, we used both theﬂat estimationand category is less speciﬁc than the actual place, its use is help-weighted estimationmodels (Cf Table2) ful for infrequently visited places for which the number of observations per place is limited Learning the combination weight In this section, we describe how to learn the combination Parameter estimationweight for the next place prediction task As discussed in the For predictingy(u,i)givenx(u,i)=xk, we could use thegeneral condition model, the combination weights could be k direct estimate of the conditional distribution (called ’ﬂat’ inshared across users, although basic models are personalized the discussions to follow) as follows:and are trained for each user separately Let{x(u,i),y(u,i)}u=1 U,i=1 Nbe a training set for learn-20 u ingw For each example(u, i), we always hasKoutput15 distributions coming fromKpersonalized models that were trained on a subset of data{x(u,j),y(u,j)}j Let10 =1 (i−1) ybe p(u,i)=pk|x(u,i) the conditional probability pro-5 k,yk 0 vided by thekthmodel, then the combined conditional prob-30min1h2h4h8h pw ability can be written as:K(u,i)k(a) histogram =1k,y2 p(y|x(u,i))=k,(4) Z(x(u,i))1 5 yKp(wk whereZ(x(u,i))=ku,i)is the normaliza-1 =1k,y tion term Ideally, we want to ﬁnd the weight vectorwso0 5 that the probability of the true destination,p(y(u,i)|x(u,i))is0 1h2h4h8h higher than other possible destinations Formally, we want30min pwpw(b) Mixture of log-normal distribution thatK(u,i)kK(u,i)k u,i) k=1k,y(u,i)>k=1k,y∀y=y( Figure 4 Distribution of visit duration at a given place ⇐⇒ lnp(u,i),w>lnp(u,i) ,y,w∀y=y(u,i) ,y(u,i) wherep(u,i)·,ydenotes the K-dimensional vector of probabili-dard deviation, respectively, in log-space This is equivalentto have a mixture of normal distributions on the log-space of ties of outputygiven by allKmodels The above inequal-the duration Figure4illustrates the distribution of duration ities is very similar to ranking problems which could beat work of one user in our data As can be seen, there are two learned by minimizing the following objective function:peaks at 4 and 8 hours which could be due to the higher like- λu,iy(u,i)lihood of having lunch outside and or staying at the working 2(u,i) u,i)−lnp ,y,w) 2w+max(0,1−lnp ,y(place the entire day hi nge loss(u,i)(u,j) =xk, we ﬁrst select the (5)For predictingygivenxk ={j|j 10FREQ+DUR+HOUR+WE rank of destination Figure 6 Fraction of trusted transition for each rank of the destination,Figure 7 Learnedwof the ensemble model for next place prediction ordered by number of visit few top places are involved in daily routines, we also found Note, however, that these ﬁltering methods work under thethat the mobility patterns are quite complex in our data with u,i)(u,i+1) assumption that the raw location data is always accurate (18% of loop-transitions (i e ,v( id=v id) in the e g , 50 meters or less) This is in general true given theset of trusted transitions (which could be affected by erro- location estimation method used in our framework, but inneous raw location data) In the leave-one-user-out cross- practice we detected a small amount of non-accurate datavalidation, we train the combination weight vector on data after manually checking some abnormal (most likely impos-from 152 users and then making prediction for every trusted sible) behaviors (e g , multiple visits at a same place duringobservation of the remaining user Recall that to make pre- nights) Unfortunately, we do not have a fully reliable waydiction at timet, all model parameters are estimated with to detect and ﬁlter out these anomalies which come mainlyuser’s data up to timet, in other words, the model always from the data sources (e g , instabilities in WiFi) predict the future Final data for prediction tasks The processed data con-Next place prediction results sists of 98 000 visits from 153 users, corresponding to anAccuracy Table2reports the all-time average prediction average of 2 5 visits per user per active day On average,accuracy over the set of trusted transitions The baseline ac- the place discovery algorithms outputs 37 distinct places percuracy is 0 411, which corresponds to the model that outputs user, but this number varies signiﬁcantly depending on thethe most visited place up to prediction time with an excep- user As can be seen in Figure5, the largest fraction of userstion that if the user was in the most visited place then the visited 20-80 distinct places in which they stayed for at leastnext destination is predicted to be the second most visited 20 minutes After ﬁltering, we get 30,000 trusted transitionsplace As described earlier, we consider various sets of con- for next place prediction and 41,000 trusted visit for dura-textual variables for building single models, then we com- tion prediction These numbers correspond to 0 6 trustedbined them together to get a ensemble model that exploits transitions per day per user and 0 8 trusted visits per dayall the contextual information For each set of variables, we during the recording periods for each user As user behav-considered two models corresponding to two ways of esti- ior models are trained from trusted transitions and trustedmating the probability These sets were obtained by a heuris- visits, the relative low rate of trusted observations is expecttic greedy process which starts with individual features, then to have a direct effect on the predictive performance, espe-adding more features to the existing sets of features until the cially for infrequent places for which the number of observa-performance is not improved Note thatLOCis more spe- tions is already low For frequently visited places, one couldciﬁc than the place categoriesFREQorDUR, combining the expect that the model can make accurate predictions in theactual location and its categories does not enrich the context long term To study the distribution of visits over places,Finally, we also consider the set of all contextual variables we sort the set of places of each user by the total number ofAmong the set of conditional variables, we see thatLOCis visits Then we report the percentage of trusted transitionsthe most important for predicting next place with accuracy for each place-rank of destination (e g , most visited, second0 59 using weighted estimation In general, weighted esti- most visited, etc) in Figure6 Note that, to reduce the in-mation is slightly better that ﬂat estimation, which can be ﬂuence of missing data, we only consider trusted transitions explained by the fact that people change their mobility pat- On one hand, a few top places have very large fraction of vis-terns over time Note that the model using only location with its compared to the rest of place On the other hand, despiteﬂat estimation is equivalent to a 1st-order Markov chain de- the low frequency of visit, the total number of occasionalﬁned on the set of destinations HOURis the second most visits are not ignorable (e g , 10% of trusted transitions’ des-important contextual variable and the combination of loca- tinations are places outside the top-10 places) While only ation and hour also result the best single model with 0 6041 12am 250 0 83am 6am200 0 6 9am accuracy150 0 412pm time of day3pm100 0 2 10501001502002503003504004505005506006507006pm number of transitions50 9pm 1 MonTueWedThuFriSatSun 0 8 (a) number of transitions 0 6 accuracy 0 412am0 9 3am 0 20 8 1481216202428323640444852566064686am number of weeks Figure 8 Next place prediction accuracy vs number of previously9am0 7 observed transitions and the recording time The dark curve is the12pm 0 6 average estimated on all users Each ﬁne curve corresponds to one user 3pm time of day 0 5 accuracy Note that using richer context does not improve6pm the accuracy, which seems to come from the sparsity of ob-9pm0 4 served data in the contextual space For example, the mod- TueWedThuFriSatSun els that consider the all contextual variables have accuracyMon around 0 53, even worse than usingLOCorHOURonly (b) accuracy Although the use of rich contextual set does not improve theFigure 9 Human mobility in weekly calendar performance of single predictor, it can contribute to the ﬁnalcorrelation between the accuracy and the number of trusted solution of the ensemble method Our ensemble method suc-transitions This justiﬁes that the number of observations cessfully exploits the large number of contextual variablesplays a central role in prediction by combining the output probability from multiple models Finally, we study the predictability of human mobility with Using prior information from these models, we reach therespect to the weekly calendar Figure9(a) and (b) show all-time average accuracy of 0 64 Figure7illustrates thethe number of trusted transitions and the accuracy of the en- weights learned for each single model As can be seen, al-semble model, respectively, in each 30-minute time slot in most all selected subsets of contextual variables contributethe weekly calendar As can be seen, the number of transi- to the ﬁnal solution tions reﬂects the daily movement of people in real life, such Predictability over time Since our personal model param-as going to work in the morning or having lunch at noon in eters are updated after each observation, we could expectweekdays While there are few transitions during the nights, that the prediction accuracy would improve over time Fig-the destinations of these transitions are quite easy to pre- ure8(a) illustrates the accuracy of the ensemble model as adict (probably going home) On the contrary, 3-4pm is the function of number of previously observed transitions Ker-most inactive period in ofﬁce hours, but the predictability nel density estimation was used to estimate the accuracy at ais low, illustrated by the dark colors in Figure9(b) Tran- given number of trusted transitions (we used normal kernelsitions in weekends are hardest to predict, especially from withσ=4+0 2nwherenis the number of transitions) 9am-4pm It is also interesting to note that the accuracy on The dark curve is estimated on the whole set of trusted tran-Sunday evening is higher than on Saturday evening, which sitions for all users and the ﬁne curve is estimated on datareﬂects the pattern of going home on Sunday evening and for each user separately Note that the number of transi-getting ready for work on Monday tions varies depending on the user, which results in curvesVisit duration prediction of different length As expected, the accuracy has the ten-In this section, we study the performance of our models on dency to increase as the number of observed transitions in-the data set consiting of 41,000 trusted visits Recall that we creased However, we also observed a large variance of ac-developed a user-independent model (called general model) curacy at the beginning (10-50 transitions), which reduces aswhich exploits the general dependencies between duration the number of transitions increases Also, many user curvesand general contextual variables suchHOURorDOW For are not monotone, which suggests again that user mobilityall mixture models, we set the maximum number of com- patterns might change over time Figure8(b) shows the evo-ponents to be 2 (larger number does not help improving the lution of accuracy as a function of time since the ﬁrst trustedperformance) The baseline results correspond to a model transition had occurred To improve the readability of thethat always outputs the median of trusted visit duration, esti- ﬁgure, we only show curves for people who contribute atmated to be 2 3 hours Table3reports relative error of dura- least 6 months of data and have at least 300 trusted tran-tion prediction of general model with various sets of contex- sitions Again, we observe that the accuracy generally im-tual variables These sets were obtained by heuristic greedy proved over time However, its seems that the correlation be-process similar to the case of next place prediction The best tween the accuracy and the recording time is weaker than thecontextual variable was found to beHOURwhile the best 15000 Table 3 All-time averaged error of general conditional models The lower the better 10000 FeatureErrorFeature setError FREQ0 519FREQ + HOUR0 4425000 HOUR0 505FREQ + HOUR + DOW0 445 DOW0 592FREQ + HOUR + WE0 4430number of visits WE0 587FREQ + HOUR + WE + BT0 442 8haccuracy LOC0 4230 5620 4070 543 8h270428492108774900 77 LOC + HOUR0 4200 4280 3760 384 LOC + HOUR + WE0 4290 4370 3770 385 LOC + HOUR + DOW0 4770 4810 3980 403Table 5 Confusion matrix of duration prediction and accuracy per BT + PC + HOUR + WE0 4900 4970 4360 443category Rows correspond to actual duration and columns correspond FREQ + WE0 4810 6520 4740 643to predicted duration The overall classiﬁcation accuracy is 0 56 FREQ + HOUR + WE0 4440 4550 4040 413 FREQ + BT0 4800 6270 4680 613these two categories are rather different Figure10(bottom) FREQ + BT + WE0 4830 6160 4650 596 FREQ + BT + HOUR + WE0 4740 4800 4110 417shows that the relative error for long visits are signiﬁcantly Ensemble distribution0 355lower than the one for short visits (0 254 vs 0 420), which Ensemble output0 375means that most prediction errors come from short visits To Dur mod =Duration model ; Lev mod =Leaving time model provide a more common evaluation measure, we consider a classiﬁcation task where the predicted durations and the ac- set of contextual variables was{FREQ,HOUR,BT}, whichtual durations are mapped into the above 5 categories and re- gives a relative error of0 441 port the confusion matrix (see Table5) Interestingly, while having high relative error, short visits of less than 1 hour Thanks to its user-independent nature, the general modelhave a classiﬁcation accuracy rate of 63%, only lower than can be used for new users without retraining However, wethe recognition rate of the longest category (77%) could not expected that this model could have optimal per-Algorithmic complexity of our framework formance since each user has a different mobility pattern The computational cost of the framework consists of learn- The results of the personalized models are reported in Tableing the combination weights, updating the user behavior model 4 We found that the best single personalized model is basedand making predictions Learning the combination weights onLOCandHOUR, giving a relative error of0 42, whichis the most expensive part, which is done on a dedicated data is slightly better than the performance of the general model set for calibrating the weights of single predictors (e g , the By combining the best general model with the personalizedobjective function in Eq (5) involves 480,000 hinge loss model, the relative error drops signiﬁcantly to0 376withterms in our experiment) However, the weightwis esti- the best single model Finally, the ensemble method overmated just once, before the actual use of the predictive mod- general+personalized models also improve the performance:ule This computation would not represent a load to a mobile our ensemble method (called ensemble distribution) reducesdevice running the prediction application the relative error to0 355 We also implemented a linear combination approach (called ensemble output) which is aFor next place prediction, the output space is discrete, there- popular combination method that used the output value offore the cost for updating parameters (basically, updating basic models only As can be seen, this baseline approach forthe counts of destination for a given context) and the cost combining multiple models improves very slightly the bestfor making predictions (combining conditional distributions, general+personalized model (0 375 vs 0 376) This resultEq 4) are low In the case of duration prediction, both emphasizes the strength of our approach which can combineparameter updating and predicting require more computa- efﬁciently multiple models tions Note that as we use 1-D mixture model to represent the conditional distribution on the continuous output vari- Finally, we study the average error conditioned on the stayable, the basic models are to be estimated with EM (in prac- duration The set of visits is divided into 5 categories:lesstice, less than 10 iterations) For predicting duration, recall than 1h, 1-2h, 2-4h, 4-8h, more than 8h The histogram ofthat we relied on a step approximation of the continuous out- stay duration in Figure10(top) shows thatshort visit(lessput distribution since an analytic solution is not available than 1 hour) andlong visit(more than 8 hours) are the twoThe cost for building the approximation and making predic- most popular categories However, the average errors fortions is thenO(K×L)whereKis the number of predictors (K= 36for duration prediction experiments) andLis the7 M C Gonzalez, C A Hidalgo, and A -L Barabasi number of steps in the approximation (in practice, we setUnderstanding individual human mobility patterns L= 500, corresponding to roughly1%of relative error onNature, 453(7196):779–782, June 2008 the duration) Hence, the overall on-device computational8 T Heskes Selecting weighting factors in logarithmic cost are relative low, and can be handled by most modernopinion pools Proc NIPS, pages 266–272, 1998 mobile devices 9 J Hightower, S Consolvo, A LaMarca, I Smith, and CONCLUSIONJ Hughes Learning and recognizing the places we go In this paper, we developed a general framework for pre-InProc UbiComp, pages 159–176 2005 dicting human mobility behavior and applied it to two spe-10 J H Kang, W Welbourne, B Stewart, and ciﬁc tasks Our approach can be viewed as an ensembleG Borriello Extracting places from traces of locations method that combines a set of models which learn variousProc WMASH, 9(3):110–118, 2004 mobility patterns from past observations Considering a dif-11 M Kim, D Kotz, and S Kim Extracting a mobility ﬁcult real-life data set, we demonstrated the potential of ourmodel from real user traces InProc INFOCOM, 2006 approach on predicting human mobility in real world con-12 J Krumm and A J B Brush Learning time-based ditions While human mobility is not always predictable,presence probabilities InProc Pervasive, 2011 repetitive routines can be learned and predicted from con-13 J Krumm and E Horvitz Predestination: Inferring textual cues sensed by smartphones The set of contextualdestinations from partial trajectories InProc cues can be efﬁciently exploited in a principled way usingUbicomp, pages 243–260, 2006 ensemble methods, leading to improved performance 14 H J Kushner and G Yin Stochastic Approximation Algorithms and Applications Springer Verlag, 1997 One issue that we encountered is the low rate of trusted ob-15 T Liu, P Bahl, and I Chlamtac Mobility modeling, servations, this can be solved by improving the sensing tech-location tracking, and trajectory prediction in wireless nique and/or exploiting untrusted/missing observation in theATM networks Selected Areas in Communications, learning framework Among many other directions to ex-16(6):922–936, 1998 plore, we would like to consider more types of contextual16 A Monreale, F Pinelli, R Trasarti, and F Giannotti variables and other mobility patterns for improving predic-Wherenext: a location predictor on trajectory pattern tive performance This would increase both the number ofmining InProc KDD, pages 637–646, 2009 contextual variables and the number of basic models for cap-17 P Nurmi and S Bhattacharya Identifying meaningful turing different types of mobility patterns Since our ap-places: The non-parametric way InProc Pervasive proach is general, we are also interested in applied it to otherComputing, 2008 prediction tasks including human behavior beyond mobility 18 S Patel, J Kientz, G Hayes, S Bhat, and G Abowd This paper considers the prediction of user mobility whenFarther than you may think: An empirical investigation they arrive to or leave a place In practice, it could be rele-of the proximity of users to their mobile phones In vant to predict user behavior at any time the context changes Proc Ubicomp, pages 123–140, 2006 Finally, while this work considers only discrete contextual19 A Peddemors, H Eertink, and I Niemegeers variables to simplify the estimation of basic models, we alsoPredicting mobility events on personal devices plan to study other types of contextual variables or to usePervasive Mob Comput , 6:401–423, August 2010 non-parametric methods for basic models 20 M Raento, A Oulasvirta, R Petit, and H Toivonen Contextphone: A prototyping platform for Acknowledgmentscontext-aware mobile applications IEEE Pervasive This work was funded by Nokia Research Center LausanneComputing, 4(2):51–59, 2005 (NRC) through the LS-CONTEXT project 21 S Scellato, M Musolesi, C Mascolo, V Latora, and REFERENCESA T Campbell Nextplace: a spatio-temporal 1 R Bajaj, S L Ranaweera, and D P Agrawal GPS:prediction framework for pervasive systems InProc location-tracking technology Computer, 35(4):92–94,Pervasive, pages 152–169, 2011 2002 22 C Song, Z Qu, N Blumm, and A -L Barabsi Limits 2 A T Campbell, S B Eisenman, N D Lane,of predictability in human mobility Science, E Miluzzo, and R A Peterson People-centric urban327(5968):1018–1021, 2010 sensing InProc WICON, 2006 23 L Song, D Kotz, R Jain, and X He Evaluating 3 O Chapelle and S S Keerthi Efﬁcient algorithms fornext-cell predictors with extensive wi-ﬁ mobility data ranking with svms Inf Retr , 13:201–215, June 2010 InIEEE Transactions on Mobile Computing, pages 4 E Cho, S A Myers, and J Leskovec Friendship and1414–1424, 2004 mobility: user movement in location-based social24 L Vu, Q Do, and K Nahrstedt Jyotish: A novel networks InProc KDD, pages 1082–1090, 2011 framework for constructing predictive model of people 5 T -M -T Do and D Gatica-Perez Groupus:movement from joint wiﬁ/bluetooth trace InPerCom, Smartphone proximity data and human interaction typepages 54–62, 2011 mining InProc ISWC, June 2011 25 V W Zheng, Y Zheng, X Xie, and Q Yang 6 N Eagle and A (Sandy) Pentland Reality mining:Collaborative location and activity recommendations sensing complex social systems Personal Ubiquitouswith gps history data InProc WWW ACM, 2010 Comput , 10(4):255–268, 2006 